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Abstract 

Remembering the information in a text is different from learning from a text and applying the acquired 
knowledge (e.g., by making inferences). This distinction was hivestigated with a dissociation paradigm. 
After reading an expository text, subjects performed either a memory (recognition) or an inferendng 
(verification) test. The effects of the same variables on the performance on the two tasks were 
compared. Text organization tended to affect recognition but not verification test performance. When 
verifying nonstudied items by inferendng, the richness oS the available text information and the type of 
processing required to make the mference were important. The educational implications of this 
dissociation between memory and inference tests are discussed. 
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ELABORATIVE INFERENCES ON AN EXPOSITORY TEXT 



One of the purposes of reading e9q>ository texts is to leam from them. Leanung from a text (2oes not 
necessarily mean comprehending and even memorizing the text, but rather, being able to use and apply 
the information presented in it. Our goal in this report is to demonstrate that comprehending and 
memorizing a text do not necessarify impfy understandiqg it weU enough to use and manipulate the 
information in it. One indication of how well readers understand the information in a text is to 
determine if they apply the studied information to make elaborative inferences. 

When reading a ttxt^ a reader is never given all the inf<mnaticm that is necessary to form a coherent 
representation of that text and all of its possible imj^cations and elaborations. Instead, the reader 
needs to rely on his or her background kno^edge and on the dues from the text to draw inferences and 
to close the gaps in the presented information as well as to go beyond the information actually 
presented. Therefore, from an educator's point of view, the aUlity of a reader to determine the missing 
information and to elaborate beyond the available information by making inferences is a prerequisite 
of reading comprehension (Anderson & Pearson, 1984; Bransford, 1979; Pearson, Hansen, & Gordon, 
1979). 

Recently^ investigators have begun to distinguish between several different kinds of inferences a reader 
can generate (e.g., Graesser, Haberlandt, & Koizumi, 1987; L >^g» Golding, Graesser, & Clark, 1990; 
Potts, Keenan, & Golding, 1988). According to these researchers, bridging inferences arc necessary to 
form a coherent representation of the text, and they are essentiil for comprehension. It is likely that 
such inferences are generated online, that is, while a reader is reading the text. In contrast, eiaborative 
inferences are not essential for a coherent integrated representation of the text, but rather they "refine 
or embellish the text representation** (Long et al., 1990). Such inferences are more likely to be 
generated at retrieval, for example, when the inference is necessaiy for a reader to answer a question 
about the information in the text (McKoon & Ratdifl^ 1990; Whitney, 1987). That is, elaborative 
inferences are not readily available in memory for direct retrieval but need to be computed given the 
demands of the (testing) situation. This does not mean, however, that elaborative inferences are trivial 
or unimportant, because these inferences involve going beyond the text and deriving unspecified text 
aspects sudi as consequences of events (McKoon & Ratdiff^ 1986; Potts et al., 1988) and specification 
of ongoing states (Seifert» Robertson, & Black, 1985). Of course, as research on transfer of kno^edge 
indicates, a crudaJ aspect of knowledge transfer is being able to use the presented information and 
understand its implications rather than merely remembering the text (e.g., Kintsch, 1986; Spiro, Vispoel, 
Samarapungavan, Schmitz, & Boerger, 1987). Hen^ . elaborative inferences are important for a deeper 
understanding of the presented material 

The research reported here had two goals: To demonstrate that elaborative inferendng involves 
processes that are different from those involved in direct retrieval of propositions from memory, and 
to examine the variables affecting elaborative inferences and discuss their educational implications. The 
first goal has been intuitively obvious for many years to educators who emphasized that learning from 
text cannot be equated with remembering the text. A student may be able to remember all the 
information m a passage and yet be unable to transfer this knowledge and apply it to new dtuations 
(Spiro et al., 1987), as indicated by problems in embellishing, extending, and applying information in the 
text, that is» in drawing elaborative inferences. Therefore, it is important to investigate (a) the kinds of 
variables that affe(A the elaborative inferendng process and (b) how these variables affea inferendng 
tests differently as compared to memory tests. 

Reder and Kintsch and their respective colleagues (Kintsch, 1986; 1988; Perrig & Kintsch, 1985; Reder, 
1982; 1987; Reder, Wible, & Martin, 1986) have begun to investigate this latter issue. Because the 
present research design is based on the findings of these researchers, we will first briefly summarize the 
findings of their studies. 
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In two experiments by Reder (1982), subjects first read a story followed the presentation of the test 
probes (test probes were single sentences). One group of subjects deddtd whether the test probe had 
been presented in the stoiy, that is, they made a recognition judgment, ^ereas another group of 
subjects decided whether a test probe was plausible ^ven the story, that is, they made a plausibility 
judgment. In other words, the first group had a memmy task and the second group had a task that 
required some inferendng. 

Reder manipulated the delay between the presentation of the stories and the test probes. The 
recognition judgments were very much affected 1^ the delay between the initial reading and the test. 
A.S the memory became poorer with delay, both the accuracy and the speed of the recognition judgments 
di oppedconsiderabfy. In contrast, the plausibifity judgment were ncAafiisct^ More 
important, except in the no-delay condition, plau^bi% judgments were faster and more accurate than 
were the recognition judgments. Consequectfy, Reder (1^ Reder et al., 1986) suggested that 
searching memory for a proposition is a different process than computing an inference. She also 
convincingly argued that when verifying the truth d statements, judgjmg plausibility by making inferences 
is a more efficient strategy than direct retrieval of facts stored in memory. 

Even though their theoretical orientation is somewhat different, Perrig and Kintsch (1985; Kintsch, 1986) 
also suggested that memory retrieval and inferendng tests are not alike. In their e^qperiments, they have 
presented the description of a town either in geographical terms in w^at they call the "survey text" (e.g., 
the church is north of the mn) or in driving instructions format in what they call the "route text" (e.g., 
coming from the church you make a left turn to reach the inn). Because the overall coherence of the 
route text was higher, the subjects in that group recalled more information. However, when they were 
asked to make inferences to verify a spatial orientation that was not explicitly mentioned in the text (e.g., 
the highway is south of the church), both groui^ had similar levels of performance. The critical variable 
affecting the inference task was the congruency between the type of text read and the type of inference 
item presented for verification. The subje^s ^o had the geographical description were better on 
inference statements given in geographical terms, whereas the subjects ^o had the route text performed 
better on items written in the route description format. (Althou^ the gender of the subjects interacted 
with this findmg, it is not important for our purposes). In short, performance on direct memory and 
on inference tasks was affeaed by different variables. The coherence of the text affected performance 
on the memory test, whereas the congruency between the type of text and the type of test questions 
affected inferendng performance. However, in^ead of proposing different processes involved in 
memory retrieval and inference tasks (as Reder had suggested), Perrig and Kintsdi postulated that each 
type of task relies on a different type of representation. They suggested that a proposition^ 
representation of a text is sufficient for free recall or recognition tests. However, to make inferences, 
a more global situation model (van Dijk & Kintsch, 1983) or "mental model" (Johnson-Laird, 1983) is 
required. 

Our goal is not to compare Reder's process and Kintsch*s representational models, because as Anderson 
(1978) has argued, it is quite difiGcult to separate the question of representation from the question of 
processing. It is possible, for example, that the kind of infoimation in the situation model proposed by 
Kintsch and his colleagues invites more plausibility computations, whereas the propositionsd 
representation of the text mvites more direct retrieval. Our goal, however, is to explore further the 
difference between memory and inference tasks using a dissociation paradigm under well*controlled 
conditions. The dissociation paradigm involves comparing the effects of the same variables across the 
tasks of interest. To have unambiguous results with such a paradigm, the conditions during study and 
test phases need to be as equivalent as possible across the two tasks (Neely, 1989). 

Reder investigated the effects of identical variables on two different tasks, but only used short stories. 
Perrig and Kintsch used an »q>ository text, but they did not systematically observe the effeOs of identical 
variables across the memory and inference tasks (because it wasn't their main focus). In our study, we 
have attempted to fill the gaps in the currently available data. That is, we have used a relatively long 
expository text and manipidated the same variables across both a memory and an inference task. We 
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have kept the study conditions identical across the two tasks and compared the performance on the two 
tasks on identical materials. 

We used a recognition test as the memory test and a sentence verification test as the inference test. The 
recognition test did not require generating inferences, because subjects were asked to judge v^ether a 
given statement was e3q>Ucitfy studied before. To use Reder's model, direct retrieval was encouraged. 
Also, because the delay between the acquidtion and the retrieval phases of the 63q[)eriment was not very 
long, it was likely that in this task the subjects would not try to generate inferences. In contra^ the 
verification test required judgjuig whether a statement was true or ftdse. Instructions for the verification 
test emphasized that a statement could be true or false irrespective of vAether it has been studied 
before. In sum, the onty difference between the two tasks was in the instructions given (0er the subjects 
read the text: The memory group was asked to judge whether the fflven test statement was studied, 
whereas the inference group was asked to judge vidi^er the same statement was true. 

We compared the effects of the same two variables described below across these two tasks. If a variable 
has similar effects on performance in recogniticm and verification tests, then there will n(rt be evidence 
for a dissociation between the two tasks. Such a result would indicate that the elaborative inferendng 
process mvolved in verifying a statement is similar in nature to processes involved in direct retrieval 
from memory. In contr^ differing effects of a variable on the two tasks would indicate that the two 
tasks differ. Fmding such a dissociation between the two tasks has educational implications. If memory 
and inference tasks are affected different^ by several variables, then it would be a sound education^ 
strategy to emphasize variables that affect inferendng just as much as those that affect memory. 

The text used in our e)q;)eriment was an e}q>ository passage about an ancient civilization (Phoenicians) 
that was quite imfamiliar to the undergraduate students in our subject pool. In a pilot e)q>eriment with 
different subjects from the same pooCwe had asked five open-ended questions, worth 2 points each, 
about the Phoenicians (e.g., when and where they lived, their economy, their contribution to modem 
civilization), as well as direct questions about v^ether the students had ever studied about the 
Phoenicians. The 32 subjects m the pilot group had an average score of 1.12 out of 10 on the five 
questions. Only ji2 subjects reported ever studying about Phoenicians, but 7 of those individuals 
qualified their answer by saying that they had studied about Phoenicians very briefly or that Phoenicians 
were only mentioned in their history class. In short, our pilot data indicat^ that the topic of our texts 
was quite unfamiliar to the subjects m our pool. The reason for selec^^> ^ an unfamiliar topic was to 
discourage the use of background information and to encoivage subjects to rely on the presented 
information to draw their inferences. As Ced and McNeills (1987) have discussed, in some e)q)eriments, 
the differences observed in processing may actually be caused by the differences in the knowledge base. 
Hence^ choosing an unfamiliar text reduces (but does ncA completely eliminate) the problem of differing 
knowledge bases accounting for differences observed in the inferendng process* 

The first variable manipulated across the recognition and verification tasks was the st^e of presentation 
of the text. Subjects read the text in one of the three formats: For the "struaured" group, the text was 
hierarchically organized under headings such as "History," "Culture and dvilization" and so on. This 
group read the same structured text twice. For the "unstructured" group,^e text was coherent, but it 
was not organized hierardiically. To do Uiis, the pantgmphs in the hierarchically organized (structured) 
text were reordered and the headings were removed. This group read the same unstructured text during 
the two presentations. Fmally, for the "mixed" group, the passage was structured during the first 
presentation, but it was unstructured during the second presentation. 

The presentation style was predicted to affect the memory for the text. Research on text structure has 
indicated that subjects can recall more information*«^pecially main ideas-from well-organized 
expository texts as compared to scrambled texts (Taylor & Samuels, 1983; Richgels, McGee, Lomax, 
& Sheard, 1987). In addition, well-organized tacts are more helpful when the topic is unfamiliar (for 
a review see Roller. 1990). However, when the same saambled passage is given twice, some 
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improvement in memory occurs (Danner, 1976). In our iMyi reading twice the text organized the same 
way (structured or unstructured during both presentations) was predicted to lead to better memory for 
the material as compared to a mixed presentation, which in turn, was predicted to help in the 
recognition test. However, in the verification test such direct retrieval of information is not required 
and hence the memory manipulation should have little effca in this test (see also Perrig & Kintsch, 
1985). 

The second variable compared across the t^ tasks was the nature of the inference items. As discussed 
before, readers do not seem to generate a lot of eUborativc hferences while reading a text, but do so 
during retrieval, for example m response to a test question (tm a review see Weaver & Kintsch, 1990). 
This implies that both the richness of the text informaticm available for possible use b inferendng as 
well as Uie nature of the test questions that necesutated bferendng are important. Hence, m our study, 
following the study of texts, we gave our subjects test statements and asked for recognition or 
verification judgments. The nonstudied items that neoesdtated inferendng were classified on two 
dimensions labeled as richness and distance, llie "richness'* dimension reflected the richness of the text 
information available to generate inferences. An inference could be drawn based on a limited set of 
information in the text (i,e.. Restricted item), or based on a richer data base from the text (i.e.. Broad 
item). The richness of the sources available to make an inference should affect the verification task. 
In fact, Anderson and Reder (1987) demonstrated that subjects were faster in making sensibility 
judgments if they knew many facts about a concept. However, the richness dimension is not predicted 
to affect the recognition task because this task does not require manipulation of Uie informaUon in the 
text to draw inferences. 

The "distance" dimension reflected the type of inferendng that was required by the test questioa 
Following the framework of Armbruster and Ostertag (1989), we classified each inference item as Near 
or Far in terms of the type of inference necessary to verify ^he question. Armbruster and Ostertag were 
dissatisfied by dassification schemes that identify a question only as textually eiq^lidt or implidt because 
the textually implidt cat^ory embodied many different kmds of inferences. Instead, Armbruster and 
Ostertag used a modification of the taxonomies of Bloom and his colleagues (Bloom, Engelhart, Furst, 
Hill, & Krathwohl, 1976) and of Barrett (1976) to reflect the differential cognitive processing required 
of different types of inferences. They distmguished between inferences that reqmre combining 
information across a text and inferences that require applying information to a novel situation or 
inferences that involve predictions, hypothesis generation and analogies. We dassified the first type of 
inference as Near and the latter two as Far inferences for our purposes. That is, if the inference only 
involved putting together the esq^lidt information across the text to verify a statement, then it was 
dassified as a Near item. If an item required applying the studied information to a novel situation or 
to make predictions about a situation, then it was dassified as a Far item. It was predicted that Far 
items would lead to better recognition performance than Near items because the inferred information 
could be easily distinguished from actually studied information, ^ereas the opposite pattern was 
predicted for Uie verification test. 

To summarize, we predicted that the nature of an inference item*-both the "richness" and "distance" 
dimensions-*as well as the style of text presentation would affect recognition and verification tasks 
differently, thus producing a dissodation. 

Method 

Sulitiects 

Subjects were 110 undergraduates partidpating to fulfill a course requirement. All were native English 
speakers. There were 55 students m each of the two tasks, verification and recognition. 
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Design and Materials 

In the experiment, test format (recognition vs. verification), and the presentation format (stntcturr^ 
unstructured, or mixed) were between^subjects variables and the type of inference item was a within- 
subjects variable. 

The basic text was a passage about the Phoenicians compiled by the experimenters from several ancient 
history texts. It was approdmately 1,300 words loqg. In the structured text, there were five sections 
under the following headings: I - Introduction II - People and region lU • History IV • Culture and 
civilization V - Industry, commerce, and o^loration. In the unstructured text, identical material was 
included, but the headk^ were eliminated. Also, hi the unstructured text, the pamgrq^hs were re- 
ordered, but without compromising the coherence of the text. (Copies of the texts are available from 
the first author.) 

The tests used at the end of the experiment were created by the experimenters. There were 44 
statements on the verification test. Subjects decided vriiether a statement was true or false based on the 
text they had read. Out of these 44 items, 22 were mference (i.e., nonstudied) items because the 
answers to these questions were not explicit^ stated m the text; instead, the answer needed to be 
determined usmg different sources of information from the text. For example, the item "Phoenicians 
were expert shi^uilders** was con^dered an inference question because the text did not include this 
information. This statement is true because several sources of information in the text support it. The 
text contains information that Phoenicians lived by the sea, did extensive sea travel as traders, had many 
different kmds of ships, had access to forests, and were good woodworkers. Combining these various 
pieces of information, this inference question can be answered as "true." The answers to the inference 
questions can be found in the history books. However they were not mduded m our text, but rather 
left to our subjects to infer. In short, the '^correOness" of a response to an inference question could be 
judged by comparing a subject's response to the answer found in the books. 

The remaining 22 statements on the verification test were called studied (i.e., old) items, because their 
answers were e9q)licitly available in the text. The subjects did not need to combine various pieces of 
information across the text to verify these statements. For example, the statement "In the Phoenician 
alphabet, pictures were used to depict words similar to the Egyptian hieroglyphs" is false because the 
text e^q^lidtly mentions that the Phoenician alphabet consisted of 22 consonants that reflected the sounds 
of the language. However, the studied items were not verbatun copies of the material m the texts. In 
short, the studied items were conceptually identical but paraphrased versions of the informatiott m the 
text. Out of the 22 inference items, 13 were true and 9 were false. Out of the 22 old (studied) items, 
14 were true and 8 were false. 

The same items were included in the rf;CQgnition test except following Reder (1982), the false items 
were excluded. This was done to discourage subjects from first making plausibility judgments m the 
recognition test and thus using the decision strategy that an item must be nonstudied if it is false* There 
were 27 true items (14 studied and 13 inference) m the recognition test^ 

All inference (i.e., nonstudied) items were classified a priori on the two dimensions of distance and 
richness. The distance dimension reflected v^iiher the required inference could be generated by 
combining information across the text (i.e., Neai) or by making predictions and/or by applymg the 
studied information to novel situations (Le., Par). For example, the statement Thoenidans were highly 
influenced by the Egyptian culture" was classified as a Near item, because the reasoning chain only 
invokes combining several pieces of information in the text. The text mentions that the Phoenicians 
lived under the Egyptian rule at one time, and that they improved the glass-making techniques they 
learned from the Egyptians. Consequentiy, the inference reasoning chain can go as follows: Phoenicians 
lived under the Egyptian nde, Phoenicians learned how to make glass from the Egyptians, therefore they 
must have been influenced by the Egyptii^ and this statement must be true. 
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An example of a Far item b the statement "Finding Phoenician products such as jewehy, trinkets, and 
siher bowls in an area is good arcfaeological evidence for a Phoenician settlement or city in that area." 
To verify this statement, the inference dudn can go as follows: The text mentioned that Phoenicians 
were craftsmen and traders. The text also mentioned that th^ handcrafted many products such as 
jewelry, trinkets, glass, etc, and that they traded these products all around the Meditenanean. Another 
dvilizaUon could have boi^t and used the Phoenician products. Therefore, finding such products is 
not good evidence for a dty or a settlement because the products may have been traded goods. One 
needs better evidence to dedde that an andent Phoenician city existed hi an area, and hence this 
statement is false. In other words, the information about Phoenidans is used to generate hypotheses 
and/or to predi^ the kmd of archeological evidence that mdicates a Phoenician settlement 

The second dunension, richness, reflected the number of sources of information that could be used to 
answer the mference question. If a test item had many sources of information in the text that could 
converge on the inference, then we called it a Broad item. In contrast, if the sources of information in 
the text that were useful for makmg the mference were very few, then we called it a Restricted item. 
For example, the statement about Phoenicians behig esqieit sMpbuilJei^ was dassified as a Broad item 
because there were many sources of mformation about the forests, woodworking, tradmg, living by the 
sea, etc., that permitted an inference. In contrast, the item The traditional Phoenidan garment was a 
long, white starched dress" was classified as a Restricted item. The only hiformation m the text that 
allows subjects to correctly dassify this statement as false is that Phoenidans had a very well-developed 
t&ctile-dyeing industry. 

To classify the items a priori on the richness dimension vire presented the text and the verification test 
to our pilot subjects (n^32). AU subjects dedded whether a statement was true or false. In addition, 
half explained then* reasons, using an If . . . Then . . . type of construct. The remainmg subjects listed 
the pieces of information in the text that could be used to answer each question. Both groups of 
subjects could always look back to the text as they were completing the verification test. We identified 
the nimiber of propositions fi'om the text the subjects mentioned for each of the inference items. Across 
all 32 subjects, if an item had five or more distinct propositions used to determine the answer, then it 
was dass^ed as a Broad item, otherwise it was dassified as a Restricted item. 

The background knowledge of the subjects was assessed by a short questionnaire. On this questionnaire 
there were two questions about the subjects' history education and four very general questions about 
andent dvilizations (e.g., Where did Alexander the Great live? What is the greatest contribution of 
King Hammurabi of Babylon? What happened to the Roman Empire m 476 A.D.?) We did not 
indude any spedfic questions about the Phoenidans, because we <Ud not want to affect the mitiai 
readmg strategies of our subjects. In terms of testmg logistics, this also gave us the advantage of being 
able to give this background test even after our subjects had read the t^ about Phoenidans. We 
believe that mdi^duals ^o cannot answer ^ple, weU*known questions about Greeks and Romans are 
less likely to know about the Phoenidans. An exambation of several history books suggests that 
compared to Greeks and Romans, the space allotted to the discussion of Phoenidans is very limited. 
As an example, in the 7S(Kpage volume Histay of the Ancient WoHd (Starr, 1965), the discussion of 
Phoenidans is only seven pages long. For our pilot subjects 9/ho took hotii this general background test 
and answered specific questions about the Phoenidans, the correlation between the performance on the 
two tests was r«gmficant, r « 056 (/t«32, p < .0001). Hence, we believe that the general background 
test is a good predictor of whether our subjects are familiar with andent dvilizations, especially the 
Phoenidans. The score on this test was used as a covariate in ail the analyses reported in the Results 
section. 

Procedure 

Subjects first completed a short test asking them to match names of places and dvilizations around the 
Mediterranean to acdimate them to the topic They then studied the text about Phoenidans for 10 
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minutes. For the struOured and mixed groups, this text ^ structured, but for the unstructured group, 
it was not. Then the subjects answered the questions on the background test assessing thebr knowledge 
about andent history. Following the background test, they read the text for the second time for S 
minutes. During this presentation, the structured and unstructured groups read the texts they had 
studied before, whereas the mixed group read an unstructured text. Following the study of texts, 
subjects had a filler task of completing a leammg s^e and attitude questionnaire for S minutes. Hnally, 
they either took the verification or the recognition test. For the group perfomung the recognition test, 
it was emphasized that the statements would not be identical to the material ui the text (see note 1), 
but they were to mark an item as "studied" if that information was explicitly described in the text. In 
addition, subjects were told that all of the items on the recognition test were true. Therefore they did 
not need to use the strategy that if an item is false, it must be nonstudied (Reder, 1982). In short, 
subjects needed only to distinguish between studied and nonstudied information. After making a d'Masion 
about the study status oi an item, the subjects were asked to rate how confident they were of their 
decision (livery uncertain, S^very certain), although the cwfidence ratmgs were not analyzed 

For the subjects performmg the verification test, it was stressed that an item could be true or false 
independent of whether it was studied before* The subjects m this group wrote a short statement 
e)q;)Iaming the rationale for their true/false decision. 

For the recognition group, a score of 1 indicated a hit or a correct rejection and a score of 0 indicated 
a false alarm or a miss. For the verification group, a score of 1 indicated a correct response and a score 
of 0 mdicated an incorrect response^ The authors scored the sheets independently (interrater reliability 
s .98) and any conflicts were resolved by discussion. 

Results md Discussion 

Separate statistical analyses were performed on the data from each task. All results are reported at the 
.05 level of significance unless otherwise indicated The results are organized aroimd the two variables 
of interest: presentation type and item effects. 

Effects of the Presentation Type 

The effects of the type of text during study, structured-structured (SS) for the structured group, 
structured-unstructured (SU) for the mixed group, unstructured-unstructured (UU) for the unstructured 
group, were examined for both tasks using the score on the background knowledge questionnaire as a 
covariate. 

Veriflcatlon task. The verification scores were analyzed in an analysis of covariance (ANCOVA), with 
response type (true vs. false) and study status (studied vs. inferred) as within-subjects variables and 
presentation type (SS, SU, UU) as the between*subjects variable. The results, as well as the number 
of subjects m each condition^ are presented in Table 1. The effect of study status was significant, 
F(1,S2) B 182.56, MSe ^ .02; indicating that the studied items were verified more accurately than inferred 
items> .848 and .622, respectively. Neither the main effect of the presentation type nor its interactions 
with response type and study status were significant^ all Fs < 1. 

To summarize, the presentation type which can presumably influence the memory for the text did not 
affect performance on the verification task. Hence, the goodness of the memory for the text was not 
a significant variable in the verification task. 

[Insert Table 1 about here*] 

Recognition task. Because only true items were included in the recognition test, response type was not 
a variable any longer^ thus only the study status and the presentation type were mduded m the analyses. 



11 



Durgu&oglu & Jchng Elaborativc Inferences • 9 



The results are presented in Table Z Hit and false alarm rates, mdicattng correct recognition of studied 
items and false recognition of inferred items, respective^, were analyzed separately with presentation 
style (SS, SU and UU) as the between-subjects variable. When the hit rates were considered, the effect 
of the presentation type was not significant, F < 1, mdicat^g that all subjects were equally adept in 
recognizing studied items. 

[Insert Table 2 about here] 

However, the false alarm rates showed a different pattern* Here the three groups differed, F(2,S1)b 
3.16, MS. - SXll. The groups that studied the same text twice (SS and UU) had low false alarm rates, 
0.266 and 0.2S1, respectively. The group that studied two versions of the text (SU) had a significantly 
higher false alarm rate (0367), indicating a poorer memory and more difficulty in rejecting nonstudied 
items. 

Across the recognition and verification tasks, text presentation did not produce very strong effects. 
However, any effe^ of text presentation were only observed in ttte recQgniiion but not in the 
verification test performance. 

Item Efliects 

The test items that required the generation of an inference rather than direct retrieval of intact 
information from memory were analyzed further using the dimensionii of richness and distance. Because 
there were only true items m the recognition task, the false items were excluded from the verification 
task to compare the same inference (nonstudied) items aaoss the two tasks. However, in the last 
section, the distance and richness effects for both true and false items in the verification task are also 
discussed. 

Recognition task. Separate analyses were performed on the correct rejection rates of inferred 
(nonstudied) items with presentation type as the between-subjects variable and distance as the within- 
subjects variable in the first analysis and ridmess as the within-subjects variable in the second one. The 
results are presented in Table 3. In the first analysis the presentation type yielded no main effects, 
F(2,51) » 2.47, MS. « .04, and did not interact with the distance variable, F{2fi2) » 1.52, MS. » .03. 
However, there was a significant main effb^ of distance, F(1,S2) « 70.76, MS. « .03, indicating that 
performance on the Far items was much better than the performance on the Near items (.903 versus 
.645, respectively). As expected, it was harder to reject inferred items as nonstudied when they were 
conceptually closer to the actually studied material. 

[Insert Table 3 about here] 

The analysis with the richness vanable yielded no significant effects of presentation type, F(2,51) » 2.45, 
MS.- .07, or a Presentation Type x Ridmess intera^ion, F(2,52)=i 1.30, MS. -.03. There was a 
mar^aUy significant effect of the richness variable F(l,52)«3.oi5^ = .03,/> < .09. Correct rejection 
rates were .723 for Restricted items and .664 for Broad items. As predicted, the number of facts that 
needed to be manipulated to compute an inference (i.e., richness) did not have significant effects in the 
recognition task. In fact, if anything, Restricted items tended to be rejected more accurately. 

Verification task. A repeated measures analysis with distance (Near or Par) as the within-subjects 
variable and presentation type as the between-subje^ variable yielded no main effects of presentation 
type or any interaction of presentation type and distance, both Fs < 1. There w& ^ a significant effect of 
distance, F(l,52) » 16.80, MS. - .04, with Near items beii^ ven Jed more accurately than Far items, 
.649 and .485, respective^, indicating that it is easier Co make an inference if the required analysis 
involves integrating the material in the text rather than using or applying the studied material 
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A separate anal^ with the richness (Restricted or Broad) variable yielded no main effects or 
interactions of presentation type either^ both F& < 1. There was a significant main effect of richness, 
F(1^2) « 36.01, MS^ S.04. As expected, the performance on the Broad inference items that can be 
veniSed using many sources of information from the text was considerably more accurate than 
performance on the Restricted inference items with a small number of sources of information, .773 and 
•539, respectively. 

The same analyses were repeated, this time indudhig both true and false items from the verification test. 
The pattern was identical to that found ^en only true items were included Presentation type did not 
produce a main effect or interact with the distance variable, both Fs < 1,60. There was a main effect 
of distance, F(2fi2) « 4335, Af5,s. 03, with Near items bemg verified more accurately than Far items, 
.675 and .476, respective^. 

Presentation type did not mteract with the richness variable either, F < 1, but there was a main effect 
of richness F(l,52) « 72.55, MS, » .02 with Broad items verified more accurately than Restricted items, 
.790 and M2p respectively. 

Compared on the same inference items (all true), performance on the recogoition and verification tasks 
wei'e affected differently by the type of inference item indicating a dissociation between the recognition 
and verificati>:>n tasks. If the inference involved combining different sources of information from the text 
(i.e.. Near) rather than applying the infcmnation from the text to a new situation (Le., Far) then the 
verification performance was better. This variable had an opposite effe^ in the recognition test. The 
Near items were rejected less accurately than Far items. As Neely (1989) has summarized, the most 
convincing type of dissociation between two tasks is ^en changes in a variaUe (e.g., distance) unproves 
performance in one task but delnlitates performance in the task. A dissociation was also ol^rved 
with the richness dunension. Even though Broad items were verified more accurately than Restricted 
items in the verification task, this variable did not produce any significant effects in the recognition test. 
(If anything, there was a trend in the oppc^ite direction with Restriaed items being rejected more 
accurately than Broad items in the recog^on task). 

To summarize, if the required inference is closer to the text information in memory (distance variable) 
or if many sources of information and several routes are available to compute the inference (richness 
variable), then the verification but not the recognition performance, improves. 

General Discussion 

The results of the experiment can be summarized simply: The variables that affect performance on a 
recognition test are not similar to the variables that affect performance on a verification test. This 
finding conceptually replicates and extends the dissociations reported by Kintsch, Reder and their 
respective colleagues. 

On studied information, performance was quite good for both recognition and verification task groups. 
This result indicates that for both tasks, the studied information was most likely to be retrieved from 
memory. Hence the text presentation manipulation-wfaich presumabfy affected the memory for the text 
-did not produce any strong dissociations between the two tasks on studied items. However, the two 
tasks required different processes on nonstudied (inference) items. The recognition judgments could 
still be made by cheddng the memory to see if a proposition was studied. In contrast, an inference 
needed to be computed to make a verification judgment. Our results indicate that both the text 
presentation as well as the richness and distance dimen^ons affected the two tasks differently on those 
nonstudied items. Although not veiy strong, a mixed text presentation caused more false alarms but 
did not affect verification judgments. We will discuss the effects of the richness and distance dimensions 
in more detail below. 
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Online research has indicated that elaborative inferences are not routinely made while reading the text. 
Instead sudi inferences are usually generated as a funciicHi of the comprehension questions that are 
asked Under these drcumstances, both the availaMe infonnntion in memory that can be used to 
generate inferences as well as the nature of the questions that invites inferendng become important. 
If the information in memory is rich enou£^ (i.e^ broad not restricted), then verification performance 
improves considerabty. In contrast, the richness of the available information does not improve 
recognition performance. In our study, we deliberate^ chose a topic that was unfamiliar to our subjects 
so that the inferences could be based on the inf(mnati<N?^ in the text rather than on their background 
knowledge. However, if a reader has enou^ background information on a topic, even an impoverished 
text can be represented richly and lead to better inferendnj> For example Yekovich, Walker, Ogle, and 
Thompson (1990) demonstrated that low*ability high school students were much better at drawing 
inferences and detemuning the main idea of a passa^ if they had a lot of background mformation on 
a topic. In contrast, recallmg factual information from a passage was much less affected by the level 
of background knowledge. This pattern conceptual^ replicates our findmg that the richness of the 
available information affects inferendng, but not direct retrieval from memory. 

The nature of the question that necessitates inferendng is another dimension of interest. Questions 
that require integrating material across a text are verified more accurately than questions that require 
applying the kno^edge (near vs. far). This pattern demonstrates a difficulty articulated by educators 
many times: Students have a IcH more difficulty in applying and transferring studied information to new 
contexts as compared to integrating studied information (for a re^ew see Spiro et al., 1987). However, 
the distance dimension has the opposite effect on recognition performance. Nonstudied but Near items 
possibly have more intersections with the information in the text that makes them harder to reject as 
nonstudied 

The dissodation between memory and inferendng tasks has another educational implication. One of 
the important issues m education is reading texts in order to learn from them. Espedally with 
expository texts, the goal is not so much to recall a text well, but to learn from the text and apply thif^ 
knowledge. Hence variables that improve inferendng performance should be emphasized just as much 
as those variables that only improve memory performance. Along the same lines it is interesting to note 
that educational researchers have mostly focused on variables that affect comprehension and recall of 
a text rather than on variables that affect learning from text (Kintsch, 1986). 

As the dissociations in the studies by Perrig & Kintsch (198S) and Reder (1982) as well as our study 
indicate, having a good memory of the studied information does not necessarily mean that the studied 
information can be used to make inferences. The diallenge for an educator is to distinguish between 
educational practices that help improve memory of a text versus those that help inferendng, because 
these two outcomes, remembering a text well as compared to learning from a text are ncA facilitated by 
identical variables. 
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Footnotes 

^ Recall that the studied items were paraphrased versions of the information in the text. This 
manipulation at^ually works against our predi^ons by briogiiig recognition test closer to the verification 
test and reducing the differences between the two tasks, thus makug it more difficult for us to find 
dissociations between the two tasks. 

* We have also scored the verification test data considering both the accuracy of the truc'false 
decision and the accuracy and completeness of the gilven rationale. The overall pattern of results did not 
change. 
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Table 1 

M(?an Scores (and Standard Deviations) on the Verification Task as a Function of 
Presentation I^pe 



Presentation Type' 

SS SU UU 

>|al8 ii»19 nel8 

Studied items .840 (.119) .839 (.133) .866 (.120) 

Nonstudied items .606 (.147) .627 (.205) .636 (.159) 



*SaStructured text U» Unstructured text 
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Mean Scores (and Standard Deviations) on tlie Recognition Test as a Function of 
Presentation Type 



Presentation 

SS SU UU 

n-19 11-18 

Studied items'* .785 (.148) .759 (.117) .728 (.184) 

Nonstudied items' .266 (.164) 367 (.156) .251 (.180) 



*S»Structured text U« Unstructured text 
''Correct acceptance, hit rates 
^correct acceptance, false alarm rates 
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Table 3 

Mean Scores (and Standard Deviations) on tlie Nonstudied (Inference) Items as a 
Function of Test and tlie Type of Item 



Distance Richness 

Near Far Restricted Broad 

Verification' .649 (.176) .485 (.263) .539 (.182) .773 (.247) 

Recognition** .645 (.206) .903 (.166) .723 (.167) .664 (.277) 



'= Verification scores 
Correct rejection rates 



22 



